Biostatistics For Dummies (Monika Wahi John Pezzullo)

The SE is usually written after a sample mean with a ± (read “plus or minus”) symbol followed by the

number representing the SE. As an example, you may express a mean and SE blood glucose level

measurement from a sample of adult diabetics as 120 ± 3 mg/dL. By contrast, the CI is written as a

pair of numbers — known as confidence limits (CLs) — separated by a dash. The CI for the sample

mean and SE blood glucose could be expressed like this: 114 – 126 mg/dL. Notice that 120 mg/dL —

the mean — falls in the middle of the CI. Also, note that the lower confidence limit (LCL) is 114

mg/dL, and the upper confidence limit (UCL) is 126 mg/dL. Instead of LCL and UCL, sometimes

abbreviations are used, and are written with a subscript L or U (as in

) indicating the

lower and upper confidence limits, respectively.

Although SEs and CIs are both used as indicators of the precision of a numerical quantity, they

differ in what they are intending to describe (the sample or the population):

A SE indicates how much your observed sample statistic may fluctuate if the same study is

repeated a large number of times, so the SE intends to describe the sample.

A CI indicates the range that’s likely to contain the true population parameter, so the CI intends to

describe the population.

If you want to have a more precise estimate of your population parameter from your sample

statistic, it’s best if the SEs are small and the CIs narrow. One important property of both CIs and

SEs is that how big they are varies inversely with the square root of the sample size. For

example, if you were to blow up your sample size — let’s pretend to quadruple it — it would cut

the size of the SE and the width of the CI in half! This square root law is one of the most widely

applicable rules in all of statistics, and is the reason why you often hear researchers trying to find

ways to increase the sample size in their studies. In practice, a reasonable sample size is reached

based on budget and historical studies, because including the whole population is usually not

possible (or necessary).

Understanding and interpreting confidence levels

The probability that the CI encompasses the true value of the population parameter is called the

confidence level of the CI. You can calculate a CI for any confidence level, but the most commonly

seen value is 95 percent. Whenever you report a CI, you must state the confidence level. As an

example, let’s restate our CI from the analysis of mean blood glucose levels in a sample of adult

diabetics to express that we used the 95 percent confidence level: 95 percent CI = 114 – 126 mg/dL.

In general, higher confidence levels correspond to wider confidence intervals (so you can have greater

confidence that the interval encompasses the true value), and lower confidence level intervals are

narrower. As an example, a 90 percent CI for the same data is a smaller range (115–125 mg/dL) and

the 99 percent CI is a larger range (112–128 mg/dL).

Although a 99 percent CI may be attractive, it can be hard to achieve in practice because an

exponentially larger sample is needed (as described earlier in this section). Also, the wide range it